Archive Migration through Workflow Automation

نویسندگان

  • Norbert Podhorszki
  • Bertram Ludäscher
  • Scott Klasky
چکیده

The Center for Plasma Edge Simulation project aims to automate the tedious tasks of simulation monitoring, data archival and coupling simulation codes using the Kepler scientific workflow environment. The technology has been successfully applied for migrating a combustion data archive of 10TB from NERSC to ORNL, where there were no other automated solutions for this task. This paper describes the workflow that migrates large files from mass storage systems using external tools and temporary staging to disks, performing different stages in a pipeline-parallel fashion, parallelizing file transfers and doing special checkpointing to make the workflow restartable and also perform operations that failed earlier. The advantage of creating/using such a workflow over specialized data migration services is its independence from specific systems so it can be used by configuring the external tools to be used. The advantage over scripts is the robust exection (handling failures and timeouts) and efficiency (parallelization wherever possible).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Workflow Composition: Semantic Representations for Flexible Automation

Many different kinds of users may need to compose scientific workflows for different purposes. This chapter focuses on the requirements and challenges of scientific workflow composition. They are motivated by our work with two particular application domains: physics-based seismic hazard analysis (Chapter 10) and data-intensive natural language processing [1]. Our research on workflow creation s...

متن کامل

Climate Science Performance, Data and Productivity on Titan

Climate Science models are flagship codes for the largest of high performance computing (HPC) resources, both in visibility, with the newly launched Department of Energy (DOE) Accelerated Climate Model for Energy (ACME) effort, and in terms of significant fractions of system usage. The performance of the DOE ACME model is captured with application level timers and examined through a sizeable ru...

متن کامل

Election Workflow Automation - Canadian Experiences

Democratic parliamentary and presidential voting supported by election systems worldwide represents the essential idea behind any free society. In recent years, numerous challenges have been overcome to satisfy this fundamental principle. On one side we have low voter turnout and high electors migration, on the other, sometimes complex electoral systems such as preferential or transferable ball...

متن کامل

Process-driven Management Information Systems - Combining Data Warehouse and Workflow Technology

The use of workflow technology promises efficiency gains through the automation of manual routing, coordination and work distribution tasks. During the execution of workflows, state-changes of the workflow engine are recorded in a log file or database, the so-called audit trail. Combined with business object data, the audit trail provides exact and timely information about the operational behav...

متن کامل

A Calculus for Propagating Semantic Annotations Through Scientific Workflow Queries

Scientific workflows facilitate automation, reuse, and reproducibility of scientific data management and analysis tasks. Scientific workflows are often modeled as dataflow networks, chaining together processing components (called actors) that query, transform, analyse, and visualize scientific datasets. Semantic annotations relate data and actor schemas with conceptual information from a shared...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007